Goto

Collaborating Authors

 Lichinga


Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

Duan, Shaohua, Li, Xinze, Liu, Zhenghao, Yi, Xiaoyuan, Yan, Yukun, Wang, Shuo, Gu, Yu, Yu, Ge, Sun, Maosong

arXiv.org Artificial Intelligence

Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synthetic data to enhance their long-context capabilities. However, the effectiveness of such approaches is often limited by the low diversity and factual inconsistencies in the generated data. To address these challenges, we propose LongMab-PO, a novel framework that leverages a Multi-Armed Bandit (MAB) rollout strategy to identify the most informative chunks from the given long context for sampling high-quality and diverse responses and constructing preference data pairs for Direct Preference Optimization (DPO) training. Specifically, we treat context chunks as arms of MAB, select chunks based on their expected reward scores to input into LLMs to generate responses, and iteratively update these scores based on reward feedback. This exploration and exploitation process enables the model to focus on the most relevant context segments, thereby generating and collecting high-quality and diverse responses. Finally, we collect these generated responses from the rollout process and apply the DPO method to further optimize the LLM. Experimental results show that LongMab-PO significantly improves the diversity and quality of preference data pairs, achieving state-of-the-art performance on long-context reasoning benchmarks.


National level satellite-based crop field inventories in smallholder landscapes

Rufin, Philippe, Hammer, Pauline Lucie, Thomas, Leon-Friedrich, Lisboa, Sá Nogueira, Ribeiro, Natasha, Sitoe, Almeida, Hostert, Patrick, Meyfroidt, Patrick

arXiv.org Artificial Intelligence

The design of science-based policies to improve the sustainability of smallholder agriculture is challenged by a limited understanding of fundamental system properties, such as the spatial distribution of active cropland and field size. We integrate very high spatial resolution (1.5 m) Earth observation data and deep transfer learning to derive crop field delineations in complex agricultural systems at the national scale, while maintaining minimum reference data requirements and enhancing transferability. We provide the first national-level dataset of 21 million individual fields for Mozambique (covering ~800,000 km2) for 2023. Our maps separate active cropland from non-agricultural land use with an overall accuracy of 93% and balanced omission and commission errors. Field-level spatial agreement reached median intersection over union (IoU) scores of 0.81, advancing the state-of-the-art in large-area field delineation in complex smallholder systems. The active cropland maps capture fragmented rural regions with low cropland shares not yet identified in global land cover or cropland maps. These regions are mostly located in agricultural frontier regions which host 7-9% of the Mozambican population. Field size in Mozambique is very low overall, with half of the fields being smaller than 0.16 ha, and 83% smaller than 0.5 ha. Mean field size at aggregate spatial resolution (0.05°) is 0.32 ha, but it varies strongly across gradients of accessibility, population density, and net forest cover change. This variation reflects a diverse set of actors, ranging from semi-subsistence smallholder farms to medium-scale commercial farming, and large-scale farming operations. Our results highlight that field size is a key indicator relating to socio-economic and environmental outcomes of agriculture (e.g., food production, livelihoods, deforestation, biodiversity), as well as their trade-offs.